Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

نویسندگان

  • Lei La
  • Qiao Guo
  • Dequan Yang
  • Qimin Cao
چکیده

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine SVM , neural networks NN , naı̈ve Bayes, and k-nearest neighbor kNN . This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-based kNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-based kNN boosting algorithm AGkNN-AdaBoost . We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boostexter: a System for Multiclass Multi-label Text Categorization

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. We rst show how to extend the standard notion of classiication by allowing each instance to be associated with multiple labels. We then discuss our approach for multiclass multi-label text categorization which is based on a new and improved family of boosting algorithms. We desc...

متن کامل

BoosTexter : A Boosting - based System for Text Categorization

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other t...

متن کامل

An kNN Model-Based Approach and Its Application in Text Categorization

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting t...

متن کامل

Linear kernel combination using boosting

In this paper, we propose a novel algorithm to design multiclass kernels based on an iterative combination of weak kernels in a schema inspired from the boosting framework. Our solution has a complexity linear with the training set size. We evaluate our method for classification on a toy example by integrating our multi-class kernel into a kNN classifier and comparing our results with a referen...

متن کامل

Inverted Index based Modified Version of KNN for Text Categorization

This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014